Last time we discussed the minimax search method for searching trees in adversarial games. The alternating maximizing and minimizing steps can be replaced with the same maximizing step if we negate the values returned each time and the game is truly a zero-sum game with the utilty in terminal states for one player always being the negative of the utility for the other player.
Here is an illustration of Negamax applied to Tic-Tac-Toe.
from IPython.display import IFrame
IFrame("http://www.cs.colostate.edu/~anderson/cs440/notebooks/negamax.pdf", width=800, height=600)
Here is a python implementation.
### Assumes that the argument 'game' is an object with the following methods
# game.get_moves()
# game.make_move(move) changes lookahead player
# game.unmake_move(move) changes lookahead player
# game.change_player() changes next turn player
# game.get_utility()
# game.is_over()
# game.__str__()
inf = float('infinity')
def negamax(game, depth_left):
if debug:
print(' '*(10 - depth_left), game, end='')
# If at terminal state or depth limit, return utility value and move None
if game.is_over() or depth_left == 0:
if debug:
print('terminal value', game.get_utility())
return game.get_utility(), None
if debug:
print()
# Find best move and its value from current state
bestValue = -inf
bestMove = None
for move in game.get_moves():
# Apply a move to current state
game.make_move(move)
# print('trying',game)
# Use depth-first search to find eventual utility value and back it up.
# Negate it because it will come back in context of next player
value, _ = negamax(game, depth_left-1)
value = - value
# Remove the move from current state, to prepare for trying a different move
game.unmake_move(move)
if debug:
print(' '*(10 - depth_left), game, "move", move,
"backed up value", value)
if value > bestValue:
# Value for this move is better than moves tried so far from this state.
bestValue = value
bestMove = move
if debug:
print("new best")
else:
if debug:
print
return bestValue, bestMove
And we can apply negamax
to Tic-Tac-Toe using the following
game
class definition.
class TTT(object):
def __init__(self):
self.board = [' '] * 9
self.player = 'X'
if False:
self.board = ['X', 'X', ' ', 'X', 'O', 'O', ' ', ' ', ' ']
self.player = 'O'
self.player_look_ahead = self.player
def locations(self, c):
return [i for i, mark in enumerate(self.board) if mark == c]
def get_moves(self):
moves = self.locations(' ')
return moves
def get_utility(self):
where_X = self.locations('X')
where_O = self.locations('O')
wins = [[0, 1, 2], [3, 4, 5], [6, 7, 8],
[0, 3, 6], [1, 4, 7], [2, 5, 8],
[0, 4, 8], [2, 4, 6]]
X_won = any([all([wi in where_X for wi in w]) for w in wins])
O_won = any([all([wi in where_O for wi in w]) for w in wins])
if X_won:
return 1 if self.player_look_ahead == 'X' else -1
elif O_won:
return 1 if self.player_look_ahead == 'O' else -1
elif ' ' not in self.board:
return 0
else:
return None
def is_over(self):
return self.get_utility() is not None
def make_move(self, move):
self.board[move] = self.player_look_ahead
self.player_look_ahead = 'X' if self.player_look_ahead == 'O' else 'O'
def change_player(self):
self.player = 'X' if self.player == 'O' else 'O'
self.player_look_ahead = self.player
def unmake_move(self, move):
self.board[move] = ' '
self.player_look_ahead = 'X' if self.player_look_ahead == 'O' else 'O'
def __str__(self):
s = '{}|{}|{}\n-----\n{}|{}|{}\n-----\n{}|{}|{}'.format(*self.board)
return s
Now, let's try an example.
def play_game_negamax(game):
print(game)
while not game.is_over():
value, move = negamax(game, 9)
if move is None:
print('move is None. Stopping')
break
game.make_move(move)
print('\nPlayer', game.player, 'to', move, 'for value', value)
print(game)
game.change_player()
debug = False
play_game_negamax(TTT())
| | ----- | | ----- | | Player X to 0 for value 0 X| | ----- | | ----- | | Player O to 4 for value 0 X| | ----- |O| ----- | | Player X to 1 for value 0 X|X| ----- |O| ----- | | Player O to 2 for value 0 X|X|O ----- |O| ----- | | Player X to 6 for value 0 X|X|O ----- |O| ----- X| | Player O to 3 for value 0 X|X|O ----- O|O| ----- X| | Player X to 5 for value 0 X|X|O ----- O|O|X ----- X| | Player O to 7 for value 0 X|X|O ----- O|O|X ----- X|O| Player X to 8 for value 0 X|X|O ----- O|O|X ----- X|O|X
For a negamax version, the meanings of alpha and beta must be swapped as players alternate, and their values must be negated.
IFrame("http://www.cs.colostate.edu/~anderson/cs440/notebooks/negamax2.pdf", width=800, height=600)
Modify Negamax to perform alpha-beta cutoffs by making these changes:
alpha
and beta
, whose initial values are $-\infty$ and $\infty$.alpha
and beta
, and the returned value from recursive calls must be negated.best_score
is greater than or equal to beta
.alpha
to maximum of best_score
and current alpha
.Apply an evaluation function to non-terminal states. It must estimate the expected utility (function applied to terminal states only) of the game from the current position.
A good evaluation function
An evaluation function is often a simple function of features of the current game position. Choice of good features is key. Requires considerable knowledge of the game and of good strategies.
A strict cutoff of search at a specific depth with the evaluation function applied there, can lead to problems. What if the advantage in the game swings quickly just after the cutoff? If a state can be judged this way, then additional search such be performed for that state (a non-quiescent state).
Current methods allow computers to search about 14 plies in chess.